Tree circuit

ABSTRACT

An extended 4-input 2-output addition block (1a) is provided, along with 4-input 2-output addition blocks (2a to 2c), in the first stage of a tree circuit. Further, 4-input 2-output addition blocks (2d and 2e) are provided in the second stage and a 4-input 2-output addition block (2f) is provided in the third stage. Input signals of the addition blocks in the same stage arrive at the same time and the number of logical stages in a critical path of the tree circuit is reduced. Thus, parallel operation of the circuit is improved, to thereby ensure higher-speed operation of a multiplier.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a binary digital arithmetic unit, andmore particularly to a tree circuit used in a parallel multipliercircuit for multiplication of multiplicand and multiplier of signednumbers in the two's complement representation to obtain a product ofsigned number in the two's complement representation.

2. Description of the Background Art

In general, recent microprocessors and DSPs (Digital Signal Processors)are equipped with a parallel multiplier for fast execution ofmultiplication instructions. The parallel multiplier circuit generates aplurality of partial products from multiplier and multiplicand as inputoperands for multiplication and adds up these partial products to obtaina multiplication result, i.e., a product. Accordingly, approaches toattain the following two objects are proposed as a technique of speedingup the operation of the parallel multiplier circuit.

The first object is to reduce the number of partial products to begenerated. To attain this object, the Booth algorithm, especially thesecondary Booth algorithm, is typically used. The second object is toperform fast addition of a plurality of the partial products. To attainthis object, a circuit system to achieve a parallel operation of fastadder circuits is required.

A background-art fast multiplier circuit will be discussed, taking acircuit for performing multiplication of a 32-bit signed multiplicand Xin the two's complement representation by a 32-bit signed multiplier Yin the two's complement representation to obtain a 64-bit signed productZ in the two's complement representation (the circuit is abbreviatedlyreferred to as "32×32 multiplier" hereinafter) as an example.

By generation of a partial product for each bit of the multiplier Y,thirty-two partial products are generated, and further these partialproducts need to be added up. However, accordingly the secondary Boothalgorithm, a set of adjacent three bits of the multiplier Y is dealtwith as a unit to reduce the number of partial products. Thus, the firstobject is attained.

Specifically, assuming that y_(i) (i=0 to 31) is 0 or 1, the multiplierY is expressed using 32-bit signed number in the two's complementrepresentation as, ##EQU1## (where Y₋₁ ≡0)

Thus, to obtain the product Z, it is only needed to add up sixteenpartial products P_(j) (j=0 to 15).

Table 1 shows a truth table of the secondary Booth algorithm.

                  TABLE 1    ______________________________________    y.sub.2j+1           y.sub.2j  y.sub.2j-1                            P.sub.j    pp.sub.j                                            pc.sub.j    ______________________________________    0      0         0      0          0    0    0      0         1      +X · 2.sup.j                                       X    0    0      1         0      +X · 2.sup.j                                       X    0    0      1         1        +X · 2.sup.j+1                                       2X   0    1      0         0        -X · 2.sup.j+1                                       .sup.˜ 2X                                            1    1      0         1      -X · 2.sup.j                                       .sup.˜ X                                            1    1      1         0      -X · 2.sup.j                                       .sup.˜ X                                            1    1      1         1      0          0    0    ______________________________________

In table 1, "˜" denotes logical inversion, and there are eightcombinations of possible values of adjacent three bits of the multiplierY. Accordingly, the partial product P_(j) takes one of 0, +X·2^(j),+X·2^(j+1), -X·2^(j), -X·2^(j+1). In binary digital arithmetic operationusing the two's complement representation system, "multiplication ofdata by two" is achieved by shifting the whole data upwardly by one bitand "sign-inversion" is achieved by inverting all the bits of the data(by which the value of the first element pp_(j) of the partial productis inverted) and adding 1 to the least significant bit (by which thesecond element pc_(j) of the partial product takes "1"). Then, thepartial product P_(j) is expressed as

    P.sub.j =(pp.sub.j +pc.sub.j)·2.sup.2j            ( 3)

Accordingly, to add up the sixteen partial products P₀ to P₁₅ generatedaccording to the secondary Booth algorithm, it is needed to add thefirst element pp_(j) having thirty-three bits of the partial product ofwhich the least significant bit is the 2j-th bit when j=0 to 15(specifically, the bit positions range from 2j to 2j+32, and theposition higher than the thirty-second bit position by one bit is neededbecause of a possibility of multiplying 32-bit data by 2) and the secondelement pc_(j) having one bit of the partial product on the 2j-th bit(in other words, 2^(2j) represents the scale of the first and secondelements).

To attain the second object, specifically, to perform fast addition ofpartial products, a carry-save technique, a Wallace-Tree technique andthe like are typically used as the circuit system to achieve theparallel operation of the fast adder circuits. Using any one of thetechniques, a plurality of (sixteen here) intermediated sums are addedin the form of a tournament while being compressed, to ultimatelyprovide two intermediated sums (the sums are referred to as "eventualintermediate sums" hereinafter). Carry signals generated during theprocess to obtain the eventual intermediate sums are postponed to thesubsequent-stage addition. Propagations of the carry signals tohigher-order bits are parallelly performed and a critical path (path forcontrolling the rate of circuit operation) is shortened on the whole, toensure fast addition.

Final addition of the two eventual intermediate sums provides a productof the multiplicand and the multiplier. The final addition is performedfor the two sums each consisting of a plurality of bits at high speed byusing e.g., carry-lookahead system. The final addition will not bediscussed since it is well-known technique.

The technique, to attain the second object, of adding a plurality ofpartial products in the form of a tournament while sequentiallycompressing to eventually generate the two eventual intermediate sumswill be examined in detail and then present a problem of the backgroundart.

FIG. 13 is a block diagram of the background art to implement thewallace-Tree technique. In this figure, 4-input 2-output addition blocks22a to 22g are interconnected in a tree structure. Further, a 3-input2-output addition block 24a is provided to receive an output of the4-input 2-output addition block 22g.

FIGS. 14A to 14C are block diagrams cooperatively showing the detail ofFIG. 13. FIG. 14 is a schematic diagram showing the connection betweenFIGS. 14A to 14C. FIG. 14A is continuous with FIG. 14B at a virtual lineQ19--Q19 and FIG. 14B is continuous with FIG. 14C at a virtual lineQ20--Q20. The width of each addition block corresponds to the bit widththereof and the position in a horizontal direction corresponds to thebit position.

FIGS. 15A to 15C are block diagrams cooperatively illustrating aconfiguration of the 4-input and 2-output addition block 22a. FIG. 15Ais continuous with FIG. 15B at a virtual line Q22--Q22 and FIG. 15B iscontinuous with FIG. 15C at a virtual line Q23--Q23. The 4-input2-output addition block 22a consists of thirty-five 4-input 2-outputadders 200 each for one bit which are connected in series. A carry-outCo of the 4-input 2-output adder 200 on each bit position becomes acarry-in Ci of the 4-input 2-output adder 200 on the higher-next bitposition. If the carry-out Co is not dependent on the carry-in Ci in a4-input 2-output adder for one bit, the carry-out Co is not propagatedacross the next bit within the 4-input 2-output addition blockconsisting of the 4-input 2-output adders connected in series.

In the background-art addition of partial products shown in FIGS. 14A to14C, the 3-input 2-output addition block 24a in the fourth stage isneeded only for adding the second element pc₁₅ having one bit of thepartial product to the lower output so₂₇ and the upper output co₂₇ ofthe 4-input 2-output addition block 22g. The second element pc₁₅ of thepartial product is an obstacle to the speeding-up of the multiplier andthe 3-input 2-output addition block 24a is an obstacle to highintegration of the circuit.

Specifically, the speed of the multiplier is estimated as follows. The3-input 2-output addition block 24a consists of 3-input 2-output adders400 each for one bit connected in series, and one of them is shown in acircuit diagram of FIG. 16. The truth table of the 3-input 2-outputadder 400 is shown in Table 2.

                  TABLE 2    ______________________________________    A          B     C            SO  CO    ______________________________________    0          0     0            0   0    0          0     1            1   0    0          1     0            1   0    0          1     1            0   1    1          0     0            1   0    1          0     1            0   1    1          1     0            0   1    1          1     1            1   1    ______________________________________

In general, the delay time of one stage of exclusive OR gate (referredto as "XOR" hereinafter) is larger than that of other AND or OR gate, orcompound gate, and is equivalent to about two stages thereof. Forexample, the critical path of the 3-input 2-output adder 400 of FIG. 16goes through two stages of XORs.

FIG. 17 is a block diagram showing a configuration of the 4-input2-output adder 200. The 4-input 2-output adder 200 for one bit can beconstituted of two 3-input 2-output adders 400 for one bit. In thiscase, the critical path of the adder goes through four stages of XORs.

With devised configuration of the 4-input 2-output adder 200, the delaytime required in this case can be further reduced. Table 3 shows a truthtable of an exemplary function that the 4-input 2-output adder 200should satisfy.

                  TABLE 3    ______________________________________    A       B     C         D   SO       Co  CO    ______________________________________    0       0     0         0    Ci      0   0    0       0     0         1   .sup.˜ Ci                                         0   Ci    0       0     1         0   .sup.˜ Ci                                         0   Ci    0       0     1         1    Ci      0   1    0       1     0         0   .sup.˜ Ci                                         0   Ci    0       1     0         1    Ci      1   0    0       1     1         0    Ci      1   0    0       1     1         1   .sup.˜ Ci                                         1   Ci    1       0     0         0   .sup.˜ Ci                                         0   Ci    1       0     0         1    Ci      1   0    1       0     1         0    Ci      1   0    1       0     1         1   .sup.˜ Ci                                         1   Ci    1       1     0         0    Ci      0   1    1       1     0         1   .sup.˜ Ci                                         1   Ci    1       1     1         0   .sup.˜ Ci                                         1   Ci    1       1     1         1    Ci      1   1    ______________________________________

FIG. 18 is a circuit diagram of an exemplary circuit which satisfies thetruth table of Table 3. The critical path is a path to obtain an outputSO of the 4-input 2-output adder 200. The output SO is an exclusive ORof five signals, i.e., inputs A, B, C, D and carry-in signal Ci. As canbe seen from FIG. 18, the exclusive OR of the inputs A and B and theexclusive OR of the inputs C and D are parallelly processed, and afterall, the critical path goes through three stages of XORs. Forconvenience, assuming that the delay of the 3-input 2-output adder 400is two stages of XORs and that of the 4-input 2-output adder 200 isthree stages of XORs, discussion will be presented below.

Since the carry-out Co is not propagated across the next bit asmentioned above, the delay times of the addition blocks 22a to 22g and24a directly depend on the delay times of the adders 200 and 400.

Provided that the multiplicand X and the multiplier Y as inputs of themultiplier are inputted at the same time, the values of the firstelements pp₀ to pp₁₅ and the second elements pc₀ to pc₁₅ of the partialproduct generated according to the secondary Booth algorithm aredetermined at the same time.

In these addition blocks of FIGS. 14A to 14C, the addition is performedin the order of the first stage of the tree circuit (the 4-input2-output addition blocks 22a to 22d), the second stage (the 4-input2-output addition blocks 22e and 22f), the third stage (the 4-input2-output addition block 22g) and the fourth stage (the 3-input 2-outputaddition block 24a). Accordingly, the delay time from the determinationof the first elements pp₀ to pp₁₅ and the second elements pc₀ to pc₁₅ ofthe partial product to the determination of the lower output so₂₈ andthe upper output co₂₈ of the 3-input 2-output addition block 24a as thetwo eventual intermediate sums is eleven (=3×3+2) stages of XORs sincethe critical path goes through three stages of 4-input 2-output additionblocks and one stage of 3-2 addition block.

The tree circuit for adding up sixteen partial products to generate thetwo eventual intermediate sums, which is constituted mainly of 4-input2-output addition blocks in FIG. 14A, may be constituted of 3-input2-output addition blocks also in stages other than the final stage ofthe tree circuit.

FIG. 19 is a block diagram showing a configuration of a tree circuitwhere the 3-input 2-output addition blocks account for larger part.FIGS. 20A to 20D are block diagrams cooperatively showing the detail ofFIG. 19. FIG. 20 is a schematic diagram showing the connection betweenFIGS. 20A to 20D. FIG. 20A is continuous with FIG. 20B at a virtual lineQ29--Q29 and FIG. 20B is continuous with FIG. 20C at a virtual lineQ31--Q31. Like FIGS. 14A to 14C, the width of each addition blockcorresponds to the bit width thereof and the position in a horizontaldirection corresponds to the bit position.

Outputs from 4-input 2-output addition blocks 32a to 32d are inputted to3-input 2-output addition blocks 34a to 34c, outputs from the 3-input2-output addition blocks 34a to 34c are inputted to 3-input 2-outputaddition blocks 34d to 34e, and outputs from the 3-input 2-outputaddition blocks 34d to 34e are inputted to a 4-input 2-output additionblock 32e. The 4-input 2-output addition blocks outputs a lower outputso₄₀ and an upper output co₄₀ as the two eventual intermediate sums.

Unlike the tree circuit of FIG. 13, the second elements pc_(j) arecollected in the order of j and inputted to the 4-input 2-outputaddition block 32a as ppc. That is expressed as ##EQU2##

In FIGS. 19 and 20A to 20D, the addition is performed in the order ofthe first stage of the tree circuit (the 4-input 2-output additionblocks 32a to 32d), the second stage (the 3-input 2-output additionblocks 34a to 34c), the third stage (the 3-input 2-output addition block34d and 34e) and the fourth stage (the 4-input 2-output addition block32e). Accordingly, the delay time from the determination of the firstelements pp₀ to pp₁₅ and the second elements pc₀ to pc₁₅ of the partialproduct to the determination of the lower output so₄₀ and the upperoutput co₄₀ of the 4-input 2-output addition block 32e as the twoeventual intermediate sums is ten (=3×2+2×2) stages of XORs since thecritical path goes through two stages of 4-input 2-output additionblocks and two stages of 3-input 2-output addition blocks. Thus, thedelay time is improved in this configuration as compared with that ofFIGS. 13 and 14A to 14C.

However, there are disadvantages that the number of addition blocksincreases by two and the circuit scale is enlarged. That results fromthat the 3-input 2-output addition block deals with one less inputsparallelly, though its delay time is shorter than that of the 4-input2-output addition block.

FIG. 21 is a block diagram of a tree circuit for generating the eventualintermediate sums in a circuit for multiplication of a multiplicand anda multiplier of 24-bit signed numbers in the two's complementrepresentation to obtain a product of 48-bit signed number in the two'scomplement representation. In this circuit, twelve partial products aregenerated according to the secondary Booth algorithm, and added up inthe form of a tournament while being compressed to eventually providethe two eventual intermediate sums.

The tree circuit is constituted of 4-input 2-output addition blocks 42ato 42e and a 3-input 2-output addition block 44a. In these additionblocks, the addition is performed in the order of the first stage of thetree circuit (the 4-input 2-output addition blocks 42a to 42c), thesecond stage (the 4-input 2-output addition block 42d and the 3-input2-output addition block 44a) and the third stage (the 4-input 2-outputaddition block 42e), and ppc is expressed as ##EQU3##

In a case of three stages of 4-input 2-output addition blocks (throughthe addition blocks 42a (or 42b), 42d and 42e), the delay time is longerthan a case of two stages of 4-input 2-output addition blocks and onestage of 3-input 2-output addition block (through the addition blocks42c, 44a and 42e). Accordingly, the delay time from the determination ofthe first elements pp₀ to pp₁₁ and the second elements pc₀ to pc₁₁ ofthe partial product to the determination of the lower output so₄₆ andthe upper output co₄₆ of the 4-input 2-output addition block 42e as thetwo eventual intermediate sums is nine stages of XORs which correspondsto three stages of 4-input 2-output addition blocks.

As discussed above, the times for determination of input data of the4-input 2-output addition block 42e in the two cases are not the same.Specifically, the lower output so₄₅ and the upper output co₄₅ of the3-input 2-output addition block 44a are determined earlier than thelower output so₄₄ and the upper output co₄₄ of the 4-input 2-outputaddition block 42d by one stage of XOR. Further, the first element pp₁₁of the input data of 3-input 2-output addition block 44a is determinedearlier than the lower output so₄₃ and the upper output co₄₃ of the4-input 2-output addition block 42c by three stages of XORs.

In the background art, the circuit operation of the tree circuit isperformed with low parallelism in some cases depending on the bit widthof the input data for multiplication. In other words, disadvantageously,speeding-up of the multiplier is not achieved because the timing ofdetermining the input data of the circuit blocks constituting the treecircuit is not uniform.

SUMMARY OF THE INVENTION

The present invention is directed to a tree circuit. According to afirst aspect of the present invention, the tree circuit which performs atournament addition on the basis of a plurality of partial productsgenerated according to Booth algorithm, generating intermediate sums tobe compressed, to output a pair of eventual intermediate sums,comprises: regular addition blocks for adding a plurality ofplural-number-bit data to output a pair of the intermediate sums; and anextended addition block for adding a plurality of plural-number-bit dataand one-bit data to output a pair of the intermediate sums.

According to a second aspect of the present invention, in the treecircuit of the first aspect, each of the plurality of partial productsis expressed as a product obtained by multiplying a sum of a firstelement of a plurality of bits and a second element of one bit by ascale and the extended addition block receives the plurality of partialproducts and further receives the second element which belongs to one ofthe plurality of partial products other than those inputted to bethereto.

According to a third aspect of the present invention, in the treecircuit of the second aspect, the second element inputted to theextended addition block belongs to the partial product which has thelargest scale among the plurality of partial products.

According to a fourth aspect of the present invention, in the treecircuit of the third aspect, the partial product which has the smallestscale among the plurality of partial products is inputted to theextended addition block.

According to a fifth aspect of the present invention, in the treecircuit of the fourth aspect, the extended addition block has extendedadders, the number of which is a predetermined number, located on aspecific bit position which is the bit position of the second elementinputted therein and higher; and regular adders located lower than thespecific bit position, and the extended adders each have one moreupward-propagation outputs for outputting data to the higher-next bit ascompared with the regular adders which constitute the regular additionblock.

According to a sixth aspect of the present invention, in the treecircuit of the fifth aspect, the extended addition block further has anadder higher than the extended adders, and the adder located higher nextto the highest one of the extended adders receives one of theupward-propagation outputs as an input other than a carry-in.

According to a seventh aspect of the present invention, in the treecircuit of the fifth aspect, the extended adders each have four inputsother than the upward-propagation outputs given from the lower-next bitposition and one of the upward-propagation outputs takes either ofdifferent values depending on whether all of the four inputs have "1"sor not.

According to an eighth aspect of the present invention, in the treecircuit of the seventh aspect, the upward-propagation outputspropagating between a plurality of the extended adders are generated asa pair of pseudo carry-outs and can be expressed as results of twopredetermined arithmetic operations performed for a pair of carry-outsgenerated in the regular adders, and the carry-outs are commutative inboth the two predetermined arithmetic operations.

According to a ninth aspect of the present invention, in the treecircuit of the eighth aspect, the extended adder located on the specificbit position receives a carry-out from the lower-next bit position andthe second element inputted to the extended addition block andpropagates the pseudo carry-outs to the extended adder located on thehigher-next bit position.

According to a tenth aspect of the present invention, in the treecircuit of the ninth aspect, the extended addition block further has aregular adder higher than the extended adders, and the highest one ofthe extended adders receives the pair of pseudo carry-outs from thelower-next bit position and outputs a pair of carry-outs to the regularadder located on the higher-next bit position.

In the tree circuit of the first aspect, the extended addition blockreceives data more than the regular addition block by one bit.Therefore, the tree circuit needs no other addition block for adding thetwo compressed intermediate sums and this one bit to obtain the eventualintermediate sums.

In the tree circuit of the second aspect, since the extended additionblock is located in the first stage of the tree circuit, theintermediate sums are given to the second stage of the tree circuit(where the intermediate sums obtained in the first stage are furtheradded) in adjustment of timing. Therefore, higher-speed processing canbe achieved by adjusting the timing of obtaining the intermediate sums,without increase in circuit scale.

Larger-scaled configuration is needed on the bit position of the secondelement inputted to the extended addition block and higher, as comparedwith the configuration on the other bit positions. In the tree circuitof the third aspect, the second element of which the bit position is thehighest is selected to be inputted to the extended addition block,thereby suppressing an increase in configuration scale of the extendedaddition block.

In the tree circuit of the fourth aspect, the addition block whichincludes the bit position where the largest number of partial productsare added (in other words, the number of partial products to be added isequal to the number of partial products to be inputted) is the lowestserves as the extended addition block. On the bit position where thenumber of partial products to be added is smaller than the number ofpartial products to be inputted and higher, it is possible to deal withthe inputted second element, without enlargement in configuration scaleof the extended addition block. In other words, in the configuration ofthe extended addition block, although the configuration scale isenlarged on the bit position of the second element inputted thereto andhigher, enlargement of that portion can be suppressed and furtherenlargement in configuration scale of the extended addition block can besuppressed.

For the second element to be inputted to the extended addition block,the adder on the specific bit position may have one-bit moreupward-propagation outputs than the adders on the lower bit positions.Since the upward-propagation output is propagated to the bit positionhigher than the specific bit position, one-bit more upward-propagationoutputs are needed also on a bit position higher than the specific bitposition. For that, in the tree circuit of the fifth aspect, theextended adder having one-bit more upward-propagation outputs isprovided on the specific bit position and higher.

In the tree circuit of the sixth aspect, a regular adder may be employedfor the adder located higher next to the extended adder on the mostsignificant order (the most significant-order extended adder) since itreceives the upward-propagation output from the most significant-orderextended adder by an input other than the carry-in.

In the tree circuit of the seventh aspect, the configuration to obtainone of the upward-propagation outputs is simplified and therefore it ispossible to suppress an enlargement in configuration scale of theextended adder and further in configuration scale of the extendedaddition block.

In the tree circuit of the eighth aspect, the upward-propagation outputmay not have a meaning of carry and the pseudo carry-outs are propagatedto higher bit position to simplify the configuration of the extendedadder.

In the tree circuit of the ninth aspect, the extended adder foradjusting the regular adder located lower than the specific bit positionand the extended adder which receives the pseudo carry-outs is provided,and that allows regular addition under the specific bit position andpropagation of the pseudo carry-outs between the extended adders at thesame time.

In the tree circuit of the tenth aspect, the extended adder forpropagating the pseudo carry-outs to higher bit position and theextended adder for adjusting this extended adder for propagation and theregular adder located higher are provided, and that allows the regularaddition in the regular adder located higher than the extended adderwhile propagating the pseudo carry-outs between the extended adders.

Accordingly, an object of the present invention is to improve theparallel operation of the parallel multiplier circuit using thesecondary Booth algorithm and to speed up the multiplier withoutremarkable increase in circuit scale.

These and other objects, features, aspects and advantages of the presentinvention will become more apparent from the following detaileddescription of the present invention when taken in conjunction with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing a first preferred embodiment of thepresent invention;

FIG. 2 is a schematic diagram showing the connection between FIGS. 2A to2C.

FIGS. 2A to 2C are block diagrams cooperatively showing the detail ofFIG. 1;

FIGS. 3A to 3C are block diagrams cooperatively showing a configurationof an extended 4-input 2-output addition block 1a;

FIG. 4 is a circuit diagram of the first example of a configuration ofan extended 4-input 2-output adder 100;

FIG. 5 is a circuit diagram of the second example of a configuration ofan extended 4-input 2-output adder 100;

FIG. 6 is a circuit diagram illustrating a configuration of an extended4-input 2-output adder 111;

FIG. 7 is a block diagram showing part of the configuration of extended4-input 2-output addition block 1a;

FIG. 8 is a circuit diagram illustrating a configuration of an extended4-input 2-output adder 110;

FIG. 9 is a circuit diagram illustrating a configuration of an extended4-input 2-output adder 112;

FIG. 10 is a block diagram showing a third preferred embodiment of thepresent invention;

FIGS. 11A to 11B are block diagrams cooperatively showing aconfiguration of an extended 3-input 2-output addition block 13a;

FIG. 12 is a circuit diagram illustrating a configuration of an extended3-input 2-output adder 300;

FIG. 13 is a block diagram of a configuration of a tree circuit in thebackground art;

FIG. 14 is a schematic diagram showing the connection between FIGS. 14Ato 14C.

FIGS. 14A to 14C are block diagrams cooperatively showing the detail ofFIG. 13;

FIGS. 15A to 15C are block diagrams cooperatively showing aconfiguration of a 4-input 2-output addition block 22a;

FIG. 16 is a circuit diagram of an exemplary configuration of a 3-input2-output adder 400;

FIG. 17 is a block diagram of an exemplary configuration of a 4-input2-output adder 200;

FIG. 18 is a circuit diagram of an example of the 4-input 2-output adder200;

FIG. 19 is a block diagram of a configuration of the tree circuit in thebackground art;

FIG. 20 is a schematic drawing showing the connection between FIGS. 20Ato 20D.

FIGS. 20A to 20D are block diagrams cooperatively showing the detail ofFIG. 19; and

FIG. 21 is a block diagram illustrating the tree circuit in thebackground art.

DESCRIPTION OF THE PREFERRED EMBODIMENTS The First Preferred Embodiment

FIG. 1 is a block diagram showing part of a configuration of amultiplier in accordance with a first preferred embodiment of thepresent invention. Multiplier and multiplicand are 32-bit signed numbersin the two's complement representation, and sixteen partial products P₀to P₁₅ are obtained according to the secondary Booth algorithm. Thisfigure does not show a function to generate these partial products butschematically shows a tree circuit which compresses the intermediatesums to eventually generate two eventual intermediate sums. As discussedin the background art, a partial product P_(j) depends on the firstelement pp_(j) of 33-bit width, the second element pc_(j) of 1-bit widthand 2j representing the least significant bit position.

The tree circuit of the first preferred embodiment is constituted of acircuit block 1a for parallelly adding four input data of a plurality ofbits and one input data of one bit (the circuit block will behereinafter referred to as "extended 4-input 2-output addition block")and 4-input 2-output addition blocks 2a to 2f.

The second element pc₁₅ of the partial product which is given to the3-input 2-output addition block 24a in the background art is applied tothe extended 4-input 2-output addition block 1a in the first preferredembodiment. That eliminates the need for the 3-input 2-output additionblock 24a.

The extended 4-input 2-output addition block 1a receives the secondelement pc₁₅ and the first elements pp₀ to pp₃ of the partial productand outputs an upper output co₁ and a lower output so₁. The 4-input2-output addition block 2a receives the first elements pp₄ to pp₇ andoutputs an upper output co₂ and a lower output so₂ as intermediate sums.The 4-input 2-output addition block 2b receives the first elements pp₈to pp₁₁ and outputs an upper output co₃ and a lower output so₃ asintermediate sums. The 4-input 2-output addition block 2c receives thefirst elements pp₁₂ to pp₁₅ and outputs an upper output co₄ and a loweroutput so₄ as intermediate sums. The 4-input 2-output addition block 2dreceives the upper outputs co₁ and co₂ and the lower outputs so₁ and so₂and outputs an upper output co₅ and a lower output so₅ as intermediatesums. The 4-input 2-output addition block 2e receives the upper outputsco₃ and co₄ and the lower outputs so₃ and so₄ and outputs an upperoutput co₆ and a lower output so₆ as intermediate sums. The 4-input2-output addition block 2f receives the upper outputs co₅ and co₆ andthe lower outputs so₅ and so₆ and outputs an upper output co₇ and alower output so₇ as eventual intermediate sums. The upper output co₇ andthe lower output so₇ are eventually added up by a final addition block(not shown) to provide a multiplication result. The above discussiongives an outline and detailed discussion will be presented, referring toFIGS. 2A to 2C.

FIGS. 2A to 2C are block diagrams cooperatively showing the detail ofFIG. 1. FIG. 2 is a schematic diagram showing the connection betweenFIGS. 2A to 2C. FIG. 2A is continuous with FIG. 2B at a virtual lineQ2--Q2 and FIG. 2B is continuous with FIG. 2C at a virtual line Q3--Q3.The width of each addition block corresponds to the bit width thereofand the position in a horizontal direction corresponds to the bitposition.

The second to thirty-second bits of the first element pp₀ <32:0> of thepartial product P₀, all bits of the first element pp₁ <34:2> and thesecond element pc₁ of the partial product P₁, all bits of the firstelement pp₂ <36:4> and the second element pc₂ of the partial product P₂,the sixth to thirty-sixth bits of the first element pp₃ <38:6> of thepartial product P₃ and the second element pc₁₅ of the partial productP₁₅ are inputted to the extended 4-input 2-output addition block 1a,adjusting the bit positions. (<u:v> indicates that the data representedby the preceding characters range from v-th to u-th bits, counted fromthe zeroth bit, i.e., the least significant bit of the multiplicationresult, and the bit position is expressed as n-th counted from the leastsignificant order of the multiplication result).

The second element pc₁ of the partial product P₁ and the second elementpc₂ of the partial product P₂ are dealt with as pseudo lower bits of thefirst element pp₃ of the partial product P₃.

The second element pc₀ of the partial product P₀ is propagated to thefinal addition block (not shown) since no other data are located on itsbit position (the zeroth bit).

The thirty-seventh and thirty-eighth bits of the first element pp₃ ofthe partial product P₃ are dealt with as pseudo upper bits of the loweroutput so₁ <36:2> of the extended 4-input 2-output addition block 1a andpropagated to the 4-input 2-output addition block 2d since the extended4-input 2-output addition block 1a does not cover the bit positions.

The second element pc₃ is not added in the extended 4-input 2-outputaddition block 1a and is propagated to the 4-2 addition block 2d sincefour data to be given to its bit position (the sixth bit) already exist.

The zeroth and first bits of the first element pp₀ of the partialproduct P₀ are dealt with as pseudo lower bits of the lower output so₁of the extended 4-input 2-output addition block 1a since the extended4-input 2-output addition block 1a does not cover the bit positions.

Since the first elements pp₂ andpp₃ are not given to the bit positionsof the second elements pc₁ and pc₂ respectively, the background-art4-input 2-output adders 200 may be used on these positions of theextended 4-input 2-output addition block 1a. However, the four firstelements pp₀ to pp₃ are also located on the bit position of the secondelement pc₁₅, i.e., the thirtieth bit. Therefore, the extended 4-input2-output addition block 1a on this bit position at least must include a6-input adder, and specifically, for the four first elements pp₀ <30>,pp₁ <30>, pp₂ <30>, pp₃ <30> (<w> indicates a bit position), the secondelement pc₁₅ and a carry-out Co of the 4-input 2-output adder 200located on the twenty-ninth bit.

Furthermore, the adder located on the thirtieth bit (referred to as"extended 4-input 2-output adder" hereinafter) has to output twocarry-outs. Since six 1-bit data are inputted, the addition result issix in decimal notation at the maximum. To propagate the carry-out onlyto the next bit, arithmetic operation is executed using a carry withweight of 2¹ with respect to the bit position of the input data but cannot be executed using a carry with weight of 2². Naturally, for anaddition result of odd number in decimal notation, an output with weightof 2⁰ with respect to the bit position of the input data (i.e., on thesame position) is also needed. Therefore, the extended 4-input 2-outputadder outputs the lower output SO with weight of 2⁰ and the upper outputCO with weight of 2¹ (which correspond to the lower output so for onebit and the upper output co for one bit, respectively) and furtherpropagate the first carry-out Co1 and the second carry-out Co2 bothhaving weight of 2¹ to the adder located on the higher-next bit.

The first principle is that the adder located higher than the bitposition of second element pc₁₅ has to receive the four first elementspp₀ to pp₃ and the first and second carry-outs Co1 and Co2 given fromthe extended 4-input 2-output adder in the lower-next position, andhence the extended 4-input 2-output adder should be employed therefor.

In the extended 4-input 2-output addition block 1a located on thethirty-third bit or higher, one input is not needed since the mostsignificant bit of the first element pp₀ is located on the thirty-secondbit. Accordingly, the second principle is that the extended 4-input2-output addition block 1a may be constituted of regular 4-input2-output adders on the thirty-third bit or higher even if there are twocarry-outs from the lower positions.

According to the first and second principles, the 4-input 2-outputadders 200 used in the background-art 4-input 2-output additional blockhave to be replaced by the extended 4-input 2-output adders for the bitposition of the second element pc₁₅ of the highest partial product P₁₅and higher and on the most significant bit position of the first elementpp₀ of the lowest partial product P₀ and lower.

Furthermore, due to the complement to the first and second principlesaccording to "one-addition technique" discussed later, the extended4-input 2-output adder is needed on the still higher bit position.Detailed discussion on the configuration of the extended 4-input2-output addition block and brief discussion on the above complementwill be presented below.

FIGS. 3A to 3C are block diagrams cooperatively showing a configurationof the extended 4-input 2-output addition block 1a. FIG. 3A iscontinuous with FIG. 3B at a virtual line Q5--Q5 and FIG. 3B iscontinuous with FIG. 3C at a virtual line Q6--Q6.

In the extended 4-input 2-output addition block 1a for parallel additionof 35-bit data, five extended 4-input 2-output adders 100 each for onebit are located on the thirtieth to thirty-fourth bits, and twenty-eight4-input 2-output adders 200 each for one bit are located on the secondto twenty-ninth bits and two 4-input 2-output adders 200 each for onebit are located on the thirty-fifth and thirty-sixth bits.

"0" is inputted to the carry-in Ci of the 4-input 2-output adder 200 onthe zeroth bit since no carry is given from the lower position. Then,the carry-out Co of the 4-input 2-output adder 200 is sequentially givento the higher-next 4-input 2-output adder 200 as the carry-in Ci.

The carry-out Co of the 4-input 2-output adder 200 on the twenty-ninthbit is given to the higher-next extended 4-input 2-output adder 100 asthe second carry-in Ci2. The first and second carry-outs Co1 and Co2 ofthe extended 4-input 2-output adders 100 on the thirtieth tothirty-third bits are given to the extended 4-input 2-output adders 100on the thirty-first to the thirty-fourth bits, respectively, as thefirst and second carry-ins Ci1 and Ci2.

The second element pc₁₅ of the partial product P₁₅ is inputted to thefirst carry-in Ci1 of the extended 4-input 2-output adder 100 on thethirtieth bit. The first carry-in Ci1 is regarded as a parity of thefour first elements pp₀ <30>, pp₁ <30>, pp₂ <30> and pp₃ <30> in weightof the thirtieth bit position, complying with the first principle.Naturally, for the same reason, the second element pc₁₅ of the partialproduct P₁₅ may be inputted to the second carry-in Ci2 of the extended4-input 2-output adder 100 on the thirtieth bit and the carry-out Co ofthe 4-input 2-output adder 200 on the twenty-ninth bit may be inputtedto the first carry-in Ci1.

The first and second carry-outs Co1 and Co2 of the extended 4-input2-output adder 100 on the thirty-fourth bit are given to the 4-input2-output adder 200 as one of its inputs ("D" in FIG. 3A) and thecarry-in Ci. The carry-out Co of the 4-input 2-output adder 200 on thethirty-fifth bit is given to the 4-input 2-output adder 200 on thethirty-sixth bit as the carry-in Ci.

The four first elements pp₀ to pp₃ are given to the four inputs A to Dof the 4-input 2-output adder 100 or 200 on the corresponding bitposition, adjusting the bit position. On a bit-by-bit basis, the 4-input2-output adder 100 or 200 outputs the upper output CO and the loweroutput SO, which correspond to the upper output co1 and lower output so₁for each bit of the extended 4-input 2-output addition block 1a.

The first element pp₂ has no data to be located on the second and thirdbit positions and the first element pp₃ has no data to be located on thesecond to fifth bit positions. On the other hand, the second elementspc₁ and pc₂ are located on the second and fourth bit positions,respectively. Accordingly, the second element pc₁ and "0" are given tothe inputs A and B of the 4-input 2-output adder 200 on the second bitrespectively, "0" is given to the inputs A and B of the 4-input 2-outputadder 200 on the third bit, the second element pc₂ is given to the inputA of the 4-input 2-output adder 200 on the fourth bit and "0" is givento the input A of the 4-input 2-output adder 200 on the fifth bit.

According to the first and second principles, only three extended4-input 2-output adders 100 are needed for the thirtieth tothirty-second bits and the regular 4-input 2-output adders 200 arelocated on the thirty-third and thirty-fourth bits. However, as shown inFIG. 3A, the inverted value of the first element pp₀ <32>, instead ofthe first element pp₀ <32>, is inputted to the input D of the extended4-input 2-output adder 100 on the thirty-second bit, the inverted valueof the first element pp₀ <32>, instead of "0", is inputted to the inputD of the extended 4-input 2-output adder 100 on the thirty-third bit,the first element pp₀ <32>, instead of "0", is inputted to the input Dof the extended 4-input 2-output adder 100 on the thirty-fourth bit, and"1", instead of "0", is given to the input C of the 4-input 2-outputadder 200 on the thirty-fifth bit.

Thus, change of summand in the arithmetic operation of signed numbers inthe two's complement representation is well-known technique, termed"one-addition technique" (not discussed in detail herein), for simplesigned-bit extension. Since this technique is generally used, theextended 4-input 2-output adders 100 more than required according to thefirst and second principles by a prescribed number are needed in higherbit (complement to the first and second principles) in the firstpreferred embodiment. The prescribed number depends on how many ordersare used in the Booth algorithm to generate the partial products, and isherein two.

The two first elements pp₂ <35> and pp₃ <35> are inputted in thethirty-fifth bit and further "1" which is needed according to theone-addition technique and the first carry-out Co1 on the thirty-fourthbit are inputted therein. Therefore, for the thirty-fifth bit, theregular 4-input 2-output adder 200 may be used since it only has to addthese four inputs. It is natural that the regular 4-input 2-output adder200 may be used also for the thirty-sixth bit since the carry-out Cofrom the 4-input 2-output adder 200 on the thirty-fifth bit is inputtedas the carry-in Ci and the first elements pp₀ and pp₁ are not inputtedin the thirty-sixth bit.

Furthermore, since the carry-in Ci and the inputs C and D of the 4-input2-output adder 200 on the thirty-fifth bit have the same weight, theseare exchangeable for each other. The first and second Ci1 and Ci2 of theextended 4-input 2-output adder 100 are also exchangeable.

Now, the above extended 4-input 2-output adder 100 will be discussedbelow. The extended 4-input 2-output adder 100 receives six data of onebit and outputs the lower output SO1 for its bit and three outputs forhigher-next bit, i.e., the first and second carry-outs Co1 and Co2 andthe upper output CO.

Accordingly, the following expression is true:

    A+B+C+D+Ci1+Ci2=2(Co1+Co2+CO)+SO                           (6)

As to one of the extended 4-input 2-output adders 100, the sum of thevalues of the first and second carry-outs Co1 and Co2 of the lower-nextextended 4-input 2-output adder 100 is at most "2" in decimal notationand affects limitedly the upper output CO and the lower output SO to beoutputted from that extended 4-input 2-output adder 100. In other words,the first and second carry-outs Co1 and Co2 depend only on the fourinputs A to D. Therefore, the carry is not propagated higher by morethan one bit.

The first and second carry-outs Co1 and Co2 depend only on the numbersof "1"s of the four inputs A to D, and the upper output CO reflects thefirst and second carry-ins Ci1 and Ci2. If the four inputs A to D haveone or less "1", there is no carry and both the first and secondcarry-outs Co1 and Co2 are "0". If the four inputs A to D have two orthree "1"s, the first carry-out Co1 is "1" and the second carry-out Co2is "0". If the four inputs A to D have four "1"s, both the first andsecond carry-outs Co1 and Co2 are "1".

On the other hand, the lower output SO defines whether the output of theextended 4-input 2-output adder 100 is odd or even number in decimalnotation, and depends on whether the number of "1"s of the four inputs Ato D and the first and second carry-ins Ci1 and Ci2 is odd or evennumber.

The upper output CO takes "1" when both the first and second carry-insCi1 and Ci2 are "1" or when the four inputs A to D have odd number of"1"s, even if either the first or second carry-in Ci1 or Ci2 is "1", andotherwise "0".

In summary, the above relation is shown as Table 4.

                  TABLE 4    ______________________________________    A   B      C      D    SO       Co1   Co2   CO    ______________________________________    0   0      0      0    Ci1   Ci2                                    0     0     Ci1 & Ci2    0   0      0      1    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    0   0      1      0    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    0   0      1      1    Ci1   Ci2                                    1     0     Ci1 & Ci2    0   1      0      0    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    0   1      0      1    Ci1   Ci2                                    1     0     Ci1 & Ci2    0   1      1      0    Ci1   Ci2                                    1     0     Ci1 & Ci2    0   1      1      1    .sup.˜ (Ci1   Ci2)                                    1     0     Ci1 | Ci2    1   0      0      0    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    1   0      0      1    Ci1   Ci2                                    1     0     Ci1 & Ci2    1   0      1      0    Ci1   Ci2                                    1     0     Ci1 & Ci2    1   0      1      1    .sup.˜ (Ci1   Ci2)                                    1     0     Ci1 | Ci2    1   1      0      0    Ci1   Ci2                                    1     0     Ci1 & Ci2    1   1      0      1    .sup.˜ (Ci1   Ci2)                                    1     0     Ci1 | Ci2    1   1      1      0    .sup.˜ (Ci1   Ci2)                                    1     0     Ci1 | Ci2    1   1      1      1    Ci1   Ci2                                    1     1     Ci1 & Ci2    ______________________________________

Table 4 shows the first example of a truth table of input-outputrelation that the extended 4-input 2-output adder 100 should satisfy.The truth table of Table 4 is expressed in Boolean expression as

    Co1=(A|B)&(C|D)|(A&B|C&D)

    Co2=A&B&C&D

    SO=A B C D Ci1 Ci2

    CO=˜(A B C D)&(Ci1&Ci2)|(A B C D)&(Ci1|Ci2)(7)

where "˜" represents logical inversion, "|" represents logical sum, "&"represents logical product and " " represents exclusive logical sum.

FIG. 4 is a circuit diagram of the first example of a configuration ofthe extended 4-input 2-output adder 100 on the basis of Formula 7. Theinputs A to D are inputted to a NAND G1 and an output therefrom isinputted to an inverter G2, and the inverter G2 outputs the secondcarry-out Co2.

The inputs A and B are inputted to an OR gate G3, the inputs C and D areinputted to an OR gate G4, and outputs from the OR gates G3 and G4 areinputted to a NAND gate G5. The gates G3 to G5 can be constructed as acompound gate.

The inputs A and B are inputted to an AND gate G7, the inputs C and Dare inputted to an AND gate G8, and outputs from the AND gates G7 and G8are inputted to a NOR gate G9. The gates G7 to G9 can be constructed asa compound gate.

The outputs from the NAND gate G5 and the NOR gate G9 are inputted tothe NAND gate G6, and a NAND gate G6 outputs the first carry-out Co1.

The inputs A and B are inputted to an XOR gate G17, the inputs C and Dare inputted to an XOR gate G18, and outputs from the XOR gates G17 andG18 are inputted to an XOR gate G19. The first and second carry-ins Ci1and Ci2 are inputted to an XOR gate G20 and outputs from the XOR gatesG19 and G20 are inputted to an XOR gate G21. The XOR gate G21 outputsthe lower output SO.

The first and second carry-ins Ci1 and Ci2 are inputted to a NAND gateG10, and also inputted to a NOR gate G11 and an output therefrom isinputted to an inverter G12. An output from the NAND gate G10, togetherwith the output from the XOR gate G19, is inputted to a NOR gate G13. Anoutput from the inverter G12, together with the output from the XOR gateG19, is inputted to an AND gate G14. Outputs from the NOR gate G13 andthe AND gate G14 are inputted to a NOR gate G15 and an output therefromis inputted to an inverter G16. The gates G14 and G15 can be constructedas a compound gate. The inverter G16 outputs the upper output CO.

The first and second carry-outs Co1 and Co2, having the same weight, maytake exchangeable values. If the four inputs A to D have two "1"s, thefirst and second carry-outs Co1 and Co2 may take "1" and "0"respectively, and if the four inputs A to D have three "1"s, the firstand second carry-outs Co1 and Co2 may take, "0" and "1" respectively.

Adding this change to Table 4, the result is shown in Table 5.

                  TABLE 5    ______________________________________    A   B      C      D    SO       Co1   Co2   CO    ______________________________________    0   0      0      0    Ci1   Ci2                                    0     0     Ci1 & Ci2    0   0      0      1    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    0   0      1      0    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    0   0      1      1    Ci1   Ci2                                    1     0     Ci1 & Ci2    0   1      0      0    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    0   1      0      1    Ci1   Ci2                                    1     0     Ci1 & Ci2    0   1      1      0    Ci1   Ci2                                    1     0     Ci1 & Ci2    0   1      1      1    .sup.˜ (Ci1   Ci2)                                    0     1     Ci1 | Ci2    1   0      0      0    .sup.˜ (Ci1   Ci2)                                    0     0     Ci1 | Ci2    1   0      0      1    Ci1   Ci2                                    1     0     Ci1 & Ci2    1   0      1      0    Ci1   Ci2                                    1     0     Ci1 & Ci2    1   0      1      1    .sup.˜ (Ci1   Ci2)                                    0     1     Ci1 | Ci2    1   1      0      0    Ci1   Ci2                                    1     0     Ci1 & Ci2    1   1      0      1    .sup.˜ (Ci1   Ci2)                                    0     1     Ci1 | Ci2    1   1      1      0    .sup.˜ (Ci1   Ci2)                                    0     1     Ci1 | Ci2    1   1      1      1    Ci1   Ci2                                    1     1     Ci1 & Ci2    ______________________________________

Table 5 shows the second example of the truth table of input-outputrelation that the extended 4-input 2-output adder 100 should satisfy.The truth table of Table 5 is expressed in Boolean expression as

    Co1=(A B)&(C D)|˜(A B)&C&D|˜(C D)&A&B

    Co2=A&B&(C|D)|C&D&(A|B)

    SO=A B C D Ci1 Ci2

    CO=˜(A B C D)&(Ci1&Ci2)|(A B C D)&(Ci1|Ci2)(8)

FIG. 5 is a circuit diagram of the second example of the configurationof the extended 4-input 2-output adder 100 on the basis of Formula 8.The configuration to obtain the lower output SO and the upper output COby using the gates G17 to G21 is the same as that of FIG. 4.

The inputs A and B are inputted to an OR gate G31 and an outputtherefrom and the inputs C and D are inputted to an AND gate G33. Thegates G31 and G33 can be constructed as a compound gate. The inputs Cand D are inputted to an OR gate G32 and an output therefrom and theinputs A and B are inputted to a NAND gate G34. The gates G32 and G34can be constructed as a compound gate. Outputs from the NAND gates G33and G34 are inputted to a NAND gate G35 and the NAND gate G35 outputsthe second carry-out Co2.

The inputs A and B are inputted to a NAND gate G36 and the inputs C andD are inputted to a NAND gate G37. An output from the NAND gate G36,together with the output from the XOR gate G18, is inputted to a NORgate G38. An output from the NAND gate G37, together with the outputfrom the XOR gate G17, is inputted to a NOR gate G39. Outputs from theXOR gates G17 and G18 are inputted to an AND gate G40. Outputs from theNOR gates G38 and G39 and the AND gate G40 are inputted to an NOR gateG41 and an output therefrom is inputted to an inverter G42, and theinverter G42 outputs the first carry-out Co1. The gates G40 and G41 canbe constructed as a compound gate.

As can be seen from comparison between Tables 4 and 5, differencesbetween these tables are found only in sections where the first andsecond carry-outs Co1 and Co2 are exchanged. In other words, there is nodifference with respect to the lower output SO and the upper output CO.Both the lower output SO and the upper output CO are functions oflogical product, logical sum and exclusive logical sum of the first andsecond carry-ins Ci1 and Ci2 and the first and second carry-ins Ci1 andCi2 are commutative in these logical operation.

The following equations are true: ##EQU4##

Thus, the configuration of the extended 4-input 2-output adder 100 hasonly to satisfy the following Boolean expression, not being limited tosuch configurations as shown in FIGS. 4 and 5. Specifically, theextended 4-input 2-output adders 100 of FIG. 3A may have differentconfigurations. ##EQU5##

In the circuit of FIG. 4 or 5, the critical path of the extended 4-input2-output adder 100 does not go through the gates G17 (or G18), G19 orG21. That is because the first and second carry-ins Ci1 and Ci2 aredetermined after the outputs from the XOR gates G17 and G18 aredetermined.

Since the first and second carry-ins Ci1 and Ci2 takes the first andsecond carry-outs Co1 and Co2 from the lower-next bit, respectively, itis needed to estimate the time required to determine the carry-outs Co1and Co2. The number of gate-stages required to determine the first andsecond carry-outs Co1 and Co2 is two in the circuit of FIG. 4 and fourin that of FIG. 5 (the circuit of FIG. 4 needs less gate-stages requiredto obtain the first and second carry-outs Co1 and Co2 than the circuitof FIG. 5). Considering that the delay of one stage of XOR is generallylarger than that of one stage of other logical gate and corresponds toabout two stages thereof, as discussed earlier, the delay time requiredto determine the first and second carry-outs Co1 and Co2 is less thanthat of two stages of XORs.

To obtain the lower output SO, the delay time of two stages of XORgates, i.e., the gates G20 and G21, is further needed after the firstand second carry-outs Co1 and Co2 are determined. After all, the delaytime ranges from three stages of XORs to four stages of XORs. However,the extended 4-input 2-output adder 100 is delayed by less than onestage of XOR as compared with the 4-input 2-output adder 200.

As to addition blocks shown in FIGS. 1 and 2A to 2C, the addition isperformed in the order of the first stage of the tree circuit (theextended 4-input 2-output addition block 1a, the 4-input 2-outputaddition blocks 2a to 2c), the second stage (the 4-input 2-outputaddition blocks 2d and 2e) and the third stage (the 4-input 2-outputaddition block 2f). Accordingly, the delay time from the determinationof the first elements pp₀ to pp₁₅ and the second elements pc₀ to pc₁₅ ofthe partial products to the determination of the lower output so₇ andthe upper output co₇ of the 4-input 2-output addition block 2f as thetwo eventual intermediate sums is (3+α)+2×3=9+α(0<α<1) stages of XORssince the critical path goes through one stage of extended 4-input2-output addition block and two stages of 4-input 2-output additionblocks.

In summary, the tree circuit of the first preferred embodiment needs thedelay time of less than ten stages of XORs, thus ensuring higher-speedoperation as compared with the background art. Moreover, it needs onlyseven addition blocks and accordingly can reduce the circuit scale. Asdiscussed above, the first preferred embodiment is achieved only byreplacing the background-art 4-input 2-output adders 200 by the extended4-input 2-output adders 100 on a bit-by-bit basis. Furthermore, amongthirty-five adders needed to form all of the addition blocks, only fiveadders have to be replaced.

In comparison between the circuits of FIGS. 4 and 18, the increase incircuit scale due to replacement of the 4-input 2-output addition block22a by the extended 4-input 2-output addition block 1a is negligible onthe whole of the multiplier since the ratio of the extended 4-input2-output adder 100 to the 4-input 2-output adder 200 in circuit scale isabout 1.5 to 1.

Except the extended 4-input 2-output addition block 1a, theconfiguration of this preferred embodiment is achieved using thebackground art. Detailed discussion will be given referring to FIGS. 2Ato 2C. The tenth to fortieth bits of the first element pp₄ <40:8> of thepartial product P₄, all bits of the first element pp₅ <42:10> and thesecond element pc₅ of the partial product P₅, all bits of the firstelement pp₆ <44:12> and the second element pc₆ of the partial productP₆, the fourteenth to forty-fourth bits of the first element pp₇ <46:14>of the partial product P₇ are inputted to the 4-input 2-output additionblock 2a, adjusting the bit positions.

The second element pc₅ of the partial product P₅ and the second elementpc₆ of the partial product P₆ are dealt with as pseudo lower bits of thefirst element pp₇ of the partial product P₇.

The eighth and ninth bits of the first element pp₄ of the partialproduct P₄ are dealt with as pseudo lower bits of the upper output co₂<45:11> of the 4-input 2-output addition block 2a and propagated to the4-input 2-output addition block 2d since the 4-input 2-output additionblock 2a does not cover their bit positions.

The second element pc₄ of the partial product P₄ is propagated to the4-input 2-output addition block 2d since the 4-input 2-output additionblock 2a does not cover its bit position.

The forty-fifth bit of the first element pp₇ of the partial product P₇is dealt with as a pseudo upper bit of the lower output so₂ <44:10> ofthe 4-input 2-output addition block 2a and propagated to the 4-input2-output addition block 2d since the 4-input 2-output addition block 2adoes not cover its bit position. The forty-sixth bit of the firstelement pp₇ of the partial product P₇ is dealt with as a pseudo upperbit of the lower output so₅ <45:6> of the 4-input 2-output additionblock 2d and propagated to the 4-input 2-output addition block 2f sinceneither the 4-input 2-output addition block 2a nor 2d covers its bitposition.

The second element pc₇ of the partial product P₇ is not added in theextended 4-input 2-output addition block 2a and is propagated to the4-input 2-output addition block 2f since four data already exit on itsbit position.

The 4-input 2-output addition block 2f is located on the fourteenth bitand higher. Accordingly, the sixth to thirteenth bits of the loweroutput so₅ of the 4-input 2-output addition block 2d, along with thesecond to fifth bits of the lower output so₁ and the zeroth and firstbits of the first element pp₀ which are dealt with as the pseudo lowerbits thereof, are dealt with as pseudo lower bits of the lower outputso₇ <62:14> of the 4-input 2-output addition block 2f to be finallyadded.

Similarly, the seventh to thirteenth bits of the upper output co₅ of the4-input 2-output addition block 2d, along with the third to fifth bitsof the upper output co₁ and the second element pc₀ which are dealt withas the pseudo lower bits thereof, are dealt with as pseudo lower bits ofthe upper output co₇ <63:15> of the 4-input 2-output addition block 2fto be finally added.

The eighteenth to forty-eighth bits of the first element pp₈ <48:16> ofthe partial product P₈, all bits of the first element pp₉ <50:18> andthe second element pc₉ of the partial product P₉, all bits of the firstelement pp₁₀ <52:20> and the second element pc₁₀ of the partial productP₁₀, and the twenty-second to fifty-second bits of the first elementpp₁₁ <54:22> of the partial product P₁₁ are inputted to the 4-input2-output addition block 2b, adjusting the bit positions.

The second element pc₉ of the partial product P₉ and the second elementpc₁₀ of the partial product P₁₀ are dealt with as pseudo lower bits ofthe first element pp₁₁ of the partial product P₁₁.

The sixteenth and seventeenth bits of the first element pp₈ of thepartial product P₈ are dealt with as pseudo lower bits of the upperoutput co₃ <53:19> of the 4-input 2-output addition block 2b since the4-input 2-output addition block 2b does not cover their bit positions.

The second element pc₈ of the partial product P₈ is propagated to the4-input 2-output addition block 2f since 4-input 2-output addition block2b does not cover its bit position.

The fifty-third and fifty-fourth bits of the first element pp₁₁ of thepartial product P₁₁ are dealt with as pseudo upper bits of the loweroutput so₃ <52:18> of the 4-input 2-output addition block 2b andpropagated to the 4-input 2-output addition block 2e since the 4-input2-output addition block 2b does not cover their bit positions.

The second element pc₁₁ of the partial product P₁₁ is not added in theextended 4-input 2-output addition block 2b and is propagated to the4-input 2-output addition block 2e since four data already exit on itsbit position.

The 4-input 2-output addition block 2e is located on the twenty-secondbit and higher. Accordingly, the eighteenth to twenty-first bits of thelower output so₃ of the 4-input 2-output addition block 2b are dealtwith as pseudo lower bits of the lower output so₆ <61:22> of the 4-input2-output addition block 2e and propagated to the 4-input 2-outputaddition block 2f.

Similarly, the nineteenth to twenty-first bits of the upper output co₃of the 4-input 2-output addition block 2b, along with the sixteenth andseventeenth bits of the first element pp₈ of the partial product P₈which are pseudo lower bits thereof, are dealt with as pseudo lower bitsof the upper output co₆ <62:23> of the 4-input 2-output addition block2e and propagated to the 4-input 2-output addition block 2f.

The twenty-sixth to fifty-sixth bits of the first pp₁₂ <56:24> of thepartial product P₁₂, all bits of the first element pp₁₃ <58:26> and thesecond element pc₁₃ of the partial product P₁₃, all bits of the firstelement pp₁₄ <60:28> and the second element pc₁₄ of the partial productP₁₄, and the thirtieth to sixtieth bits of the first element pp₁₅<62:30> of the partial product P₁₅ are inputted to the 4-input 2-outputaddition block 2c, adjusting the bit positions.

The second element pc₁₃ of the partial product P₁₃ and the secondelement pc₁₄ of the partial product P₁₄ are dealt with as pseudo lowerbits of the first element pp₁₅ of the partial product P₁₅.

The twenty-fourth and twenty-fifth bits of the first element pp₁₂ of thepartial product P₁₂ are dealt with as pseudo lower bits of the upperoutput co₄ <61:27> of the 4-input 2-output addition block 2c since the4-input 2-output addition block 2c does not cover their bit positions.

The second element pc₁₂ of the partial product P₁₂ is propagated to the4-input 2-output addition block 2e since the 4-input 2-output additionblock 2c does not cover its bit position.

The sixty-first bit of the first element pp₁₅ of the partial product P₁₅is dealt with as a pseudo upper bit of the lower output so₄ <60:26> ofthe 4-input 2-output addition block 2c and propagated to the 4-input2-output addition block 2e since the 4-input 2-output addition block 2cdoes not cover its bit position. The sixty-second bit of the firstelement pp₁₅ of the partial product P₁₅ is dealt with as a pseudo upperbit of the lower output so₆ <61:22> of the 4-input 2-output additionblock 2e and propagated to the 4-input 2-output addition block 2f sincethe 4-input 2-output addition block 2c does not cover its bit position.

The 4-input 2-output addition block 2f performs addition of all the bitsof the upper output co₆ and the lower output so₆ of the 4-input 2-outputaddition block 2e, the fourteenth and higher bits of the upper outputco₅ and the lower output so₅ of the 4-input 2-output addition block 2d,the twenty-first and lower bits of the upper output co₃ and the loweroutput so₃ of the 4-input 2-output addition block 2b and the secondelements pc₇ and pc₈. In this addition, it is clear from the figure thatthe number of inputs is four or less on the same bit position.

Since the arithmetic operation is performed while adjusting the bitpositions, only in terms of the delay time, the second element pc₁₅ maybe inputted to any of the seven addition blocks only if it is inputtedto the same bit position (the thirtieth bit). For example, the extended4-input 2-output addition block 1a may be replaced by the 4-input2-output addition block and the 4-input 2-output addition block 2a maybe replaced by the extended 4-input 2-output addition block.

However, as mentioned above, it must be considered that thedisadvantageous increase of the circuit scale is not negligible sincethe number of extended 4-input 2-output adders, i.e., bit width, of theextended 4-input 2-output addition block increases.

Therefore, the minimum value of the number of the extended 4-input2-output adders (bit width) constituting the extended 4-input 2-outputaddition block depends on the bit position of the second element pc₁₅(the thirtieth bit herein) (the first principle), the most significantbit position of one of the four input data having a plurality of bitwidth of which the most significant bit is the lowest (pp₀ herein) (thesecond principle) and the number of bits required for implementation of"one-addition technique" (two herein, which depends on the number oforders of the Booth algorithm, for complement to the first and secondprinciples).

In other words, to achieve the first preferred embodiment with bestarea-efficiency, the highest one of the second elements pc_(j) generatedaccording to the secondary Booth algorithm and one of the first elementspp_(j) of which the most significant bit is the lowest have only to beinputted to the same extended 4-input 2-output addition block.

The Second Preferred Embodiment

As is clear from Formula 10, the values of the first and secondcarry-outs Co1 and Co2 transmitted between the extended 4-input 2-outputadders 100 may have no meaning of carry. The logical product, logicalsum and exclusive logical sum (or inversion thereof) of the first andsecond carry-outs Co1 and Co2 have only to be transmitted between theextended 4-input 2-output adders 100.

In view of that, it is possible to simplify the configuration of theextended 4-input 2-output adder. Specifically, the pseudo first andsecond carry-outs Coa and Cob, instead of the first and secondcarry-outs Co1 and Co2, are used. These pseudo carry-outs Coa and Cobserve as the first and second pseudo carry-ins Cia and Cib of thehigher-next 4-input 2-output adder. For example, when the first andsecond pseudo carry-outs Coa and Cob are determined as

    Coa=Co1|Co2, Cob=˜(Co1&Co2)                 (11)

the following Formula is true:

    Coa&Cob=Co1 Co2                                            (12)

When Formula 10 is rewritten using the first and second pseudocarry-outs Coa and Cob and the first and second pseudo carry-ins Cia andCib,

    Coa=(A|B)&(C|D)|(A&B|C&D)

    Cob=˜(A&B&C&D)

    SO=A B C D (Cia&Cib)

    CO=˜(A B C D)&˜Cib|(A B C D)&Cia      (13)

Such pseudo carry suffices for transmission between the extended 4-input2-output adders. A truth table of input-output relation that theextended 4-input 2-output adder should satisfy is shown in Table 6.

                  TABLE 6    ______________________________________    A   B      C      D    SO       Coa   Cob   CO    ______________________________________    0   0      0      0    Cia & Cib                                    0     1     .sup.˜ Cib    0   0      0      1    .sup.˜ (Cia & Cib)                                    0     1     Cia    0   0      1      0    .sup.˜ (Cia & Cib)                                    0     1     Cia    0   0      1      1    Cia & Cib                                    1     1     .sup.˜ Cib    0   1      0      0    .sup.˜ (Cia & Cib)                                    0     1     Cia    0   1      0      1    Cia & Cib                                    1     1     .sup.˜ Cib    0   1      1      0    Cia & Cib                                    1     1     .sup.˜ Cib    0   1      1      1    .sup.˜ (Cia & Cib)                                    1     1     Cia    1   0      0      0    .sup.˜ (Cia & Cib)                                    0     1     Cia    1   0      0      1    Cia & Cib                                    1     1     .sup.˜ Cib    1   0      1      0    Cia & Cib                                    1     1     .sup.˜ Cib    1   0      1      1    .sup.˜ (Cia & Cib)                                    1     1     Cia    1   1      0      0    Cia & Cib                                    1     1     .sup.˜ Cib    1   1      0      1    .sup.˜ (Cia & Cib)                                    1     1     Cia    1   1      1      0    .sup.˜ (Cia & Cib)                                    1     1     Cia    1   1      1      1    Cia & Cib                                    1     0     .sup.˜ Cib    ______________________________________

FIG. 6 is a circuit diagram showing a configuration of the extended4-input 2-output adder 111 which satisfies the relation of Table 6. Ascan be seen from comparison between Formulae 13 and 7, the first pseudocarry-out Coa is equivalent to the first carry-out Co1 and the secondpseudo carry-out Cob is equivalent to an inversion of the secondcarry-out Co2. Therefore, the first and second pseudo carry-outs Coa andCob can be provided by the adder having the construction of the gates G1to G9 shown in FIG. 4 except the inverter G2.

Even if the extended 4-input 2-output adder 111 gives the pseudo carry,it outputs the upper output CO and the lower output SO. The inputs A andB are inputted to the XOR gate G17 and the inputs C and D are inputtedto the XOR gate G18. The outputs from the XOR gates G17 and G18 areinputted to the XOR gate G19. The first and second pseudo carry-ins Ciaand Cib are inputted to a NAND gate G51.

An output from the NAND gate G51 and the output from the XOR gate G19are inputted to an XNOR gate G52. The XNOR gate G52 outputs the loweroutput SO.

The second pseudo carry-in Cib and the output of the XOR gate G19 areinputted to the NOR gate G13. The first pseudo carry-in Cia and theoutput from the XOR gate G19 are inputted to the AND gate G14. Theoutputs from the NOR gate G13 and the AND gate G14 are inputted to theNOR gate G15 and the output therefrom is inputted to the inverter G16.The gates G14 and G15 can be constructed as a compound gate. Theinverter G16 outputs the upper output CO.

With the above configuration, the extended 4-input 2-output adder 111has the gates less than the extended 4-input 2-output adder 100 of FIG.4 by four, and therefore is more simplified.

However, when only the extended 4-input 2-output adders 111 are seriallyconnected, the highest one of the second elements pc_(j) (pc₁₅ in FIGS.1 and 2A to 2C) and the carry-out Co from the regular 4-input 2-outputadder 200 located lower next to the highest one of the second elementspc_(j) (the twenty-ninth bit in FIGS. 1 and 2A to 2C) can not beproperly processed. Conversely, the first and second pseudo carry-outsCoa and Cob of the extended 4-input 2-output adder 111 can not be usedas the carry-in Ci or the input D of the regular 4-input 2-output adder200.

Accordingly, another types of extended 4-input 2-output adders areneeded on the high and low ends of serial connection of the extended4-input 2-output adders 111 to adjust them to the regular 4-input2-output adders 200.

FIG. 7 is a block diagram showing part of the configuration of theextended 4-input 2-output addition block 1a, corresponding to FIG. 3A.That is, FIG. 7 is continuous with FIG. 3B at a virtual line Q5--Q5.FIG. 7 shows the configuration of the extended 4-input 2-output additionblock 1a cooperatively with FIGS. 3B and 3C. In other words, a treecircuit of the second preferred embodiment have the same configurationas that of FIG. 1, and the extended 4-input 2-output addition block 1aof the second preferred embodiment is achieved by replacing theconfiguration of FIG. 3A by that of FIG. 7.

FIG. 7 shows the configuration where the extended 4-input 2-outputadders 100 of FIG. 3A are replaced by extended 4-input 2-output adders110 to 112. In more detail, the extended 4-input 2-output adder 110 islocated on the thirtieth bit, the extended 4-input 2-output adders 111are located on the thirty-first to thirty-third bits and the extended4-input 2-output adder 112 is located on the thirty-fourth bit, insteadof the extended 4-input 2-output adders 100.

The second element pc₁₅ is inputted to an input E of the extended4-input 2-output adder 110 and the carry-out Co of the 4-input 2-outputadder 200 on the twenty-ninth bit is inputted to the 4-input 2-outputadder 110 as the carry-in Ci.

On the thirty-fifth bit, the two first elements pp₂ and pp₃ of thepartial product are located and further "1" is located according to the"one-addition technique", and the input D of the 4-input 2-output adder200 on this bit position is available. Then, the first carry-out Co1 ofthe extended 4-input 2-output adder 112 is inputted to the input D ofthe 4-input 2-output adder 200 and the second carry-out Co2 of theextended 4-input 2-output adder 112 is inputted to the carry-in Ci ofthe 4-input 2-output adder 200.

Naturally, the carry-out Co may be inputted to the input E of theextended 4-input 2-output adder 110 and the second element pc₁₅ may beinputted to the carry-in Ci. The second carry-out Co2 of the extended4-input 2-output adder 112 may be inputted to the input D of the 4-input2-output adder 200 on the thirty-fifth bit and the first carry-out Co1of the extended 4-input 2-output adder 112 may be inputted to thecarry-in Ci of the 4-input 2-output adder 200 on the thirty-fifth bit.Connections of the extended 4-input 2-output adders 110 to 112 can notbe exchanged.

To achieve the same function as the extended 4-input 2-output adders 100of FIG. 3A, in the extended 4-input 2-output adders 110 to 112 arrangedas above, the extended 4-input 2-output adder 110 has to generate thefirst and second pseudo carry-outs Coa and Cob from the inputs A to D,the carry-in Ci and the input E, and the extended 4-input 2-output adder112 has to generate the carry-out Co1 and Co2 from the inputs A to D andthe first and second pseudo carry-ins Cia and Cib.

Table 7 is a truth table of the function of the extended 4-input2-output adder 110 and Formula 14 is Boolean expression satisfying Table7. FIG. 8 is a circuit diagram illustrating a configuration of theextended 4-input 2-output adder 110 which satisfies Formula 14.

                  TABLE 7    ______________________________________    A   B      C      D    SO       Coa   Cob   CO    ______________________________________    0   0      0      0    E   Ci   0     1     E & Ci    0   0      0      1    .sup.˜ (E   Ci)                                    0     1     E | Ci    0   0      1      0    .sup.˜ (E   Ci)                                    0     1     E | Ci    0   0      1      1    E   Ci   1     1     E & Ci    0   1      0      0    .sup.˜ (E   Ci)                                    0     1     E | Ci    0   1      0      1    E   Ci   1     1     E & Ci    0   1      1      0    E   Ci   1     1     E & Ci    0   1      1      1    .sup.˜ (E   Ci)                                    1     1     E | Ci    1   0      0      0    .sup.˜ (E   Ci)                                    0     1     E | Ci    1   0      0      1    E   Ci   1     1     E & Ci    1   0      1      0    E   Ci   1     1     E & Ci    1   0      1      1    .sup.˜ (E   Ci)                                    1     1     E | Ci    1   1      0      0    E   Ci   1     1     E & Ci    1   1      0      1    .sup.˜ (E   Ci)                                    1     1     E | Ci    1   1      1      0    .sup.˜ (E   Ci)                                    1     1     E | Ci    1   1      1      1    E   Ci   1     0     E & Ci    ______________________________________

    Coa=(A|B)&(C|D)|(A&B|C&D)

    Cob=(A&B&C&D)

    SO=A B C D E Ci

    CO=˜(A B C D)&(E&Ci)|(A B C D)&(E|Ci)(14)

The first and second pseudo carry-outs Coa and Cob can be provided byusing the construction of the gates G1 to G9 of FIG. 4 except theinverter G2, as mentioned above. Furthermore, the input E and thecarry-in Ci have the same meaning as the first and second carry-ins Ci1and Ci2 of the first preferred embodiment. Accordingly, the upper outputCO and the lower output SO can be provided by the gates GIO to G21 ofFIG. 4. Therefore, the extended 4-input 2-output adder 110 can beconstituted of gates less than those in the extended-input 2-outputadder 100.

Table 8 is a truth table of the function of the extended 4-input2-output adder 112 and Formula 15 is Boolean expression satisfying Table8. FIG. 9 is a circuit diagram illustrating a configuration of theextended 4-input 2-output adder 112 which satisfies Formula 15.

                  TABLE 8    ______________________________________    A   B      C      D    SO       Co1   Co2   CO    ______________________________________    0   0      0      0    Cia & Cib                                    0     0     .sup.˜ Cib    0   0      0      1    .sup.˜ (Cia & Cib)                                    0     0     Cia    0   0      1      0    .sup.˜ (Cia & Cib)                                    0     0     Cia    0   0      1      1    Cia & Cib                                    1     0     .sup.˜ Cib    0   1      0      0    .sup.˜ (Cia & Cib)                                    0     0     Cia    0   1      0      1    Cia & Cib                                    1     0     .sup.˜ Cib    0   1      1      0    Cia & Cib                                    1     0     .sup.˜ Cib    0   1      1      1    .sup.˜ (Cia & Cib)                                    1     0     Cia    1   0      0      0    .sup.˜ (Cia & Cib)                                    0     0     Cia    1   0      0      1    Cia & Cib                                    1     0     .sup.˜ Cib    1   0      1      0    Cia & Cib                                    1     0     .sup.˜ Cib    1   0      1      1    .sup.˜ (Cia & Cib)                                    1     0     Cia    1   1      0      0    Cia & Cib                                    1     0     .sup.˜ Cib    1   1      0      1    .sup.˜ (Cia & Cib)                                    1     0     Cia    1   1      1      0    .sup.˜ (Cia & Cib)                                    1     0     Cia    1   1      1      1    Cia & Cib                                    1     1     .sup.˜ Cib    ______________________________________

    Co1=(A|B)&(C|D)|(A&B|C&D)

    Co2=(A&B&C&D)

    SO=A B C D (Cia&Cib)

    CO=˜(A B C D)&˜Cib|(A B C D)&Cia      (15)

The extended 4-input 2-output adder 112, which has to output the firstand second carry-outs Co1 and Co2, needs the gates G1 to G9 connected inthe same manner as shown in FIG. 4. The upper output CO and the loweroutput SO can be provided by the gates G13 to G19 and G41 and G42 likein the extended 4-input 2-output adder 111. Therefore, the extended4-input 2-output adder 112 can be constituted of gates less than thosein the extended-input 2-output adder 100.

Thus, in the second preferred embodiment, the circuit scale is reducedby optimizing the logic for propagation between the extended 4-input2-output adders, as discussed above, and the delay time is also reducedas compared with the first preferred embodiment.

As can be seen from FIGS. 6, 8 and 9 as to one of the extended 4-input2-output adders 110 to 112 of the second preferred embodiment, the delaytime from the determination of data values of the inputs A to D in thelower-next extended 4-input 2-output adder to the determination of thelower output SO of that extended 4-input 2-output adder can be reducedas compared with the extended 4-input 2-output adder 100 in the firstpreferred embodiment. In contrast to the first preferred embodimentwhich needs the XOR gate G20, the second preferred embodiment has onlyto include the NAND gate G51 of which the delay time is shorter thanthat of the XOR gate. Therefore, it is estimated that the delay time inthe extended 4-input 2-output addition block 1a of the second preferredembodiment is approximately three stages of XORs.

In the tree circuit constructed using the configuration of the secondpreferred embodiment, the addition is performed in the order of firststage of the tree circuit (the extended 4-input 2-output addition block1a, the 4-input 2-output addition blocks 2a to 2c), the second stage(the 4-input 2-output addition blocks 2d and 2e) and the third stage(the 4-input 2-output addition block 2f). Accordingly, the delay timefrom the determination of the first elements pp₀ to pp₁₅ and the secondelements pc₀ to pc₁₅ of the partial product to the determination of thelower output so₇ and the upper output co₇ of the 4-input 2-outputaddition block 2f as the two eventual intermediate sums is 3+2×3=9stages of XORs since the critical path goes through one stage ofextended 4-input 2-output addition block and two stages of 4-input2-output addition blocks. That is shorter than the delay time of thefirst preferred embodiment, i.e., (9+α) stages of XORs (0<α<1).

The Third Preferred Embodiment

FIG. 10 is a block diagram showing part of a configuration of amultiplier in accordance with the third preferred embodiment of thepresent invention. Both multiplier and multiplicand are 24-bit signednumbers in the two's complement representation and twelve partialproducts P₀ to P₁₁ are obtained according to the secondary Boothalgorithm. This figure does not show a function to generate thesepartial products, but schematically shows a tree circuit whichcompresses the intermediate sums to eventually generate two eventualintermediate sums in addition of the partial products. The partialproduct P_(j) depends on the first element pp_(j) of 25-bit width, thesecond element pc_(j) of 1-bit width and 2j representing the leastsignificant bit position, on the basis of Formula 5 as discussed in thebackground art.

The tree circuit of the third preferred embodiment is constituted of acircuit block 13a for parallelly adding three input data ofplural-number-bit width and one input data of 1-bit width (the circuitblock will be referred to as "extended 3-input 2-output addition block"hereinafter), 3-input 2-output addition blocks 14a to 14c and 4-input2-output addition blocks 12a to 12c.

The extended 3-input 2-output addition block 13a receives the secondelement pc₁₁ of the partial product and the first elements pp₀ to pp₂ ofthe partial product and outputs the upper output co₁₁ and lower outputso₁₁ as intermediate sums. The 3-input 2-output addition block 14areceives the first elements pp₃ to pp₅ of the partial product andoutputs the upper output co₁₂ and lower output so₁₂ as intermediatesums. The 3-input 2-output addition block 14b receives the firstelements pp₆ to pp₈ of the partial product and outputs the upper outputco₁₃ and lower output so₁₃ as intermediate sums. The 3-input 2-outputaddition block 14c receives the first elements pp₉ to pp₁₁ of thepartial product and outputs the upper output co₁₄ and lower output so₁₄as intermediate sums.

The 4-input 2-output addition block 12a receives the upper outputs co₁₁and co₁₂ and lower outputs so₁₁ and so₁₂ and outputs the upper outputco₁₅ and lower output so₁₅ as intermediate sums. The 4-input 2-outputaddition block 12b receives the upper outputs co₁₃ and co₁₄ and loweroutputs so₁₃ and so₁₄ and outputs the upper output co₁₆ and lower outputso₁₆ as intermediate sums. The 4-input 2-output addition block 12creceives the upper outputs co₁₅ and co₁₆ and lower outputs so₁₅ and so₁₆and outputs the upper output co₁₇ and lower output so₁₇ as intermediatesums. The lower output so₁₇ and the upper output co₁₇ are finally addedby the final addition block (not shown) to provide the multiplicationresult.

FIGS. 11A and 11B are block diagrams cooperatively showing aconfiguration of the extended 3-input 2-output addition block 13a. FIG.11A is continuous with FIG. 11B at a virtual line Q15--Q15.

The extended 3-input 2-output addition block 13a, which performsparallel addition of 26-bit data, has a configuration where fiveextended 3-input 2-output adders 300 each for one bit are located on thetwenty-second to twenty-sixth bits, twenty 3-input 2-output adders 400each for one bit are located second to twenty-first bits and a 3-input2-output adder 400 each for one bit is located on the twenty-seventhbits.

The sum of the three 1-bit inputs A, B and C is at most "3" in decimalnotation and it is representable using two 1-bit outputs SO and CO.Therefore, in the twenty-first bit and lower, no carry is propagatedbetween bits and no connection is needed between the 3-input 2-outputadders 400.

The second element pc₁₁ of the partial product P₁₁ is inputted to thecarry-in Ci of the extended 3-input 2-output adder 300 on thetwenty-second bit. The carry-in Ci is regarded as a parity of the threefirst elements pp₀ <22>, pp₁ <22> and pp₂ <22> in weight of thetwenty-second bit position, complying with the first principle.

Performing the "one-addition technique" as complement to the first andsecond principles, a logic inversion of the first element pp₀ <24> isinputted to the input C of the extended 3-input 2-output adders 300 onthe twenty-fourth and twenty-fifth bits and the first element pp₀ <24>is inputted to the input C of the extended 3-input 2-output adder 300 onthe twenty-sixth bit. Therefore, the 3-input 2-output adders 300 eachfor one bit are needed up to the twenty-sixth bit. In each of thetwenty-second to twenty-sixth bits, the carry-out Co of the extended3-input 2-output adder 300 is inputted to the higher-next adder as thecarry-in Ci.

In the twenty-seventh bit, since the partial products are compressed,the first element pp₂ <27> and "1" for performing the "one-additiontechnique" are inputted to the inputs A and B, and further, thecarry-out Co of the extended 3-input 2-output adder 300 on thetwenty-sixth bit is inputted to the input C.

Now, the above extended 3-input 2-output adder 300 will be discussed.The extended 3-input 2-output adder 300 receives four 1-bit data andoutputs the lower output SO1 for its bit position and two outputs forthe higher-next bit, i.e., the carry-out Co and the upper output CO.

Accordingly, the following expression is true:

    A+B+C+Ci=2(Co+CO)+SO                                       (16)

On the other hand, the lower output SO depends on whether the outputfrom the extended 3-input 2-output adder 300 is even or odd number indecimal notation, and in other words, depends on whether the number of"1"s in the three inputs A to C and the carry-in Ci is even or oddnumber. A truth table that the extended 3-input 2-output adder 300should satisfy is shown in Table 9.

                  TABLE 9    ______________________________________    A        B     C          SO   (Co, CO)    ______________________________________    0        0     0           Ci  (0, 0)    0        0     1          .sup.˜ Ci                                   (0, Ci)    0        1     0          .sup.˜ Ci                                   (0, Ci)    0        1     1           Ci  (1, 0) or (0, 1)    1        0     0          .sup.˜ Ci                                   (0, Ci)    1        0     1           Ci  (1, 0) or (0, 1)    1        1     0           Ci  (1, 0) or (0, 1)    1        1     1          .sup.˜ Ci                                   (1, Ci)    ______________________________________

When the three inputs A, B and C have two "1"s, either the carry-out Coor the upper output CO has to be "1" and the other has to be "0", andhence 2³ =8 functions are shown. From the truth table in Table 9, thelogic of the lower output SO is given using Boolean expression as

    SO=A B C Ci                                                (17)

Table 10 is a truth table illustrating one of the functions that theextended 3-input 2-output adder 300 should satisfy.

                  TABLE 10    ______________________________________    A      B          C     SO       Co  CO    ______________________________________    0      0          0      Ci      0   0    0      0          1     .sup.˜ Ci                                     0   Ci    0      1          0     .sup.˜ Ci                                     0   Ci    0      1          1      Ci      1   0    1      0          0     .sup.˜ Ci                                     0   Ci    1      0          1      Ci      1   0    1      1          0      Ci      1   0    1      1          1     .sup.˜ Ci                                     1   Ci    ______________________________________

The truth table is given using Boolean expression as

    Co=A&B|B&C|C&A

    CO=(A B C)&Ci

    SO=A B C Ci                                                (18)

FIG. 12 is a circuit diagram of an exemplary circuit satisfying thefunction of Table 10. To reduce the circuit scale and speed up theoperation, logics of both the carry-in Ci and the carry-out Co areinverted.

An OR gate G61 receives the inputs A and B and an AND gate G62 receivesan output from the OR gate G61 and the input C. An AND gate G63 receivesthe inputs A and B, and an NOR gate G64 receives outputs from the ANDgates G62 and G63 and outputs an inversion of the carry-out Co. Thegates G61 to G64 can be constructed as a compound gate.

An XOR gate G65 receives the inputs A and B and an XNOR gate G67receives an output from the XOR gate G65 and the input C. A NOR gate G68receives an output from the XNOR gate G67 and the inversion of thecarry-in Ci and outputs the upper output CO.

An XNOR gate G66 receives the input C and the inversion of the carry-inCi. An XOR gate G69 receives an output from the XNOR gate G66 and anoutput from the XOR gate G65 and outputs the lower output SO.

The critical path of the extended 3-input 2-output adder 300 goes fromthe inputs A, B and C of the lower-next bit to the upper output CO onits bit position and the delay time is between two stages of XORs andthree stages of XORs.

The addition is performed in the order of the first stage of the treecircuit of FIG. 10 (the extended 3-input 2-output addition block 13a,the 3-input 2-output addition blocks 14a to 14c), the second stage (the4-input 2-output addition blocks 12a and 12b) and the third stage (the4-input 2-output addition block 12c). Accordingly, the delay time fromthe determination of the first elements pp₀ to pp₁₁ and the secondelements pc₀ to pc₁₁ of the partial product to the determination of thelower output so₁₇ and the upper output co₁₇ of the 4-input 2-outputaddition block 12c as the two eventual intermediate sums is(2+α)+2×3=8+α stages of XORs (0<α<1) since the critical path goesthrough one stage of extended 3-input 2-output addition block and twostages of 4-input 2-output addition blocks. Thus, the delay time isbetween eight stages of XORs and nine stages of XORs and higher-speedoperation is achieved as compared with the background art.

In the third preferred embodiment, like the first preferred embodiment,the minimum circuit scale is achieved when the highest one of the secondelements pc_(j) generated according to the secondary Booth algorithm andone of the first elements pp_(j) of which the most significant bit isthe lowest are inputted to the same 3-input 2-output adder.

Supplemental Description

In the first to third preferred embodiments, the present invention hasbeen discussed, taking the 32×32 multiplier and the 24×24 multiplierusing the secondary Booth algorithm as specific examples. Further, ageneral aspect of the present invention will be discussed.

Among two inputs of the multiplier, one to be encoded according to thesecondary Booth algorithm is assumed a multiplier and the other isassumed a multiplicand. If the multiplier has 2n-bit or (2n-1)-bit width(n is integer equal to or more than two), n partial products aregenerated. Assuming that each partial product is P_(j) (j=0 to n-1),when the first element pp_(j) of (the bit width of themultiplicand+1)-bit width and the second element pc_(j) of 1-bit widthwhich is added to the least significant digit of the partial product ifthe partial product is negative are introduced, Formula 3 is true.

As discussed in the background art, when a tree circuit is formed usingthe regular 4-input 2-output addition blocks and the regular 3-input2-output addition blocks for adding up n partial products to output twointermediate sums, the (n-1) second elements pc_(j) (j=0 to n-2) may beinputted to available terminals of the addition blocks, but the secondelement pc.sub.(n-1) has no available terminal to receive it. For thisreason, the second element pc.sub.(n-1) is added separately (forexample, shown in FIG. 13) or all of the second elements pc_(j) (j=0 ton-1) are added up together (for example, shown in FIG. 19). Thus, thenumber of input data of the tree circuit is regarded as (n+1).

Only when the number of input data of the tree circuit consisting of the4-input 2-output addition blocks and the 3-input 2-output additionblocks is 2^(k) ·3^(h) (k=0, 1, 2, . . . , h=0, 1, 2, . . . ), a "dense"tree circuit can be constructed ("dense" refers to a condition where theinput data of the addition blocks in the same stage arrive at the sametime to achieve ultimate parallel operation of the circuit). The reasonis as follows.

Since the addition blocks constituting the tree circuit each have twooutputs, the final stage of the "dense" tree circuit is necessarily a4-input 2-output addition block. In the previous stage, either two4-input 2-output addition blocks or two 3-input 2-output addition blocksare provided. In other words, the number of inputs of the additionblocks previous to the final addition block is eight or six. Tracingback from the final addition stage, it is found that the number ofinputs of the "dense" tree circuit is 2^(k) 3^(h).

If the number of partial products n is 2^(k) 3^(h), the number of inputdata (n+1) is not 2^(k) 3^(h) and therefore it is impossible to form the"dense" tree circuit in the background art.

According to the present invention, one extended 4-input 2-outputaddition block or one extended 3-input 2-output addition block is usedand apparently one more available terminal is provided as compared withthe regular 4-input 2-output addition block or regular 3-input 2-outputaddition block. Therefore, the input data of the addition blocks in thesame stage can arrive at the same time. Thus, the "dense" tree circuitcan be formed, thereby reducing the delay time.

If the number of partial products n is not 2^(k) 3^(h), the "dense" treecircuit can not be formed because of intrinsic property of the number ofpartial products n and it is impossible to reduce the delay time even ifthe tree circuit having (n+1) inputs is formed according to the presentinvention.

Thus, if the multiplier (the input to be encoded according to thesecondary Booth algorithm) has 2·2^(k) ·3^(h) -bit or (2·2^(k) ·3^(h)-1)-bit width, when the tree circuit for adding up a plurality ofpartial products in the multiplier according to the secondary Boothalgorithm to output the two eventual intermediate sums are formed usingthe extended 4-input 2-output addition block or the extended 3-input2-output addition block of the present invention, the input data of theaddition blocks in the same stage of the tree circuit arrive at the sametime, the number of logical stages in the critical path of the treecircuit is reduced, the parallel operation of the circuit is improvedand higher operation of the multiplier is achieved.

When the highest one of the second elements pc_(j) and the first elementpp_(j) of which the most significant bit is the lowest are inputted tothe same extended 4-2 addition block or the same extended 3-input2-output addition block, best area-efficiency is achieved.

While the invention has been shown and described in detail, theforegoing description is in all aspects illustrative and notrestrictive. It is therefore understood that numerous modifications andvariations can be devised without departing from the scope of theinvention.

I claim:
 1. A tree circuit, which performs a tournament addition on thebasis of a plurality of partial products generated according to Boothalgorithm, generating intermediate sums to be compressed, to output apair of eventual intermediate sums, comprising:regular addition blocksfor adding a plurality of plural-number-bit data to output a pair ofsaid intermediate sums; and an extended addition block for adding aplurality of plural-number-bit data and one-bit data to output a pair ofsaid intermediate sums.
 2. The tree circuit of claim 1, whereineach ofsaid plurality of partial products is expressed as a product obtained bymultiplying a sum of a first element of a plurality of bits and a secondelement of one bit by a scale, and said extended addition block receivessaid plurality of partial products and further receives said secondelement which belongs to one of said plurality of partial products otherthan those to be inputted thereto.
 3. The tree circuit of claim 2,whereinsaid second element inputted to said extended addition blockbelongs to the partial product which has the largest scale among saidplurality of partial products.
 4. The tree circuit of claim 3,whereinthe partial product which has the smallest scale among saidplurality of partial products is inputted to said extended additionblock.
 5. The tree circuit of claim 4, whereinsaid extended additionblock has extended adders, the number of which is a predeterminednumber, located on a specific bit position which is the bit position ofsaid second element inputted therein and higher; and regular adderslocated lower than said specific bit position, and said extended adderseach have one more upward-propagation outputs for outputting data to thehigher-next bit as compared with said regular adders which constitutesaid regular addition block.
 6. The tree circuit of claim 5, whereinsaidextended addition block further has an adder higher than said extendedadders, and said adder located higher next to the highest one of saidextended adders receives one of said upward-propagation outputs as aninput other than a carry-in.
 7. The tree circuit of claim 5, whereinsaidextended adders each have three inputs other than saidupward-propagation outputs given from the lower-next bit position. 8.The tree circuit of claim 5, whereinsaid extended adders each have fourinputs other than said upward-propagation outputs given from thelower-next bit position and one of said upward-propagation outputs takeseither of different values depending on whether all of said four inputshave "1"s or not.
 9. The tree circuit of claim 8, whereinsaidupward-propagation outputs are carry-outs.
 10. The tree circuit of claim8, whereinsaid upward-propagation outputs propagating between aplurality of said extended adders are generated as a pair of pseudocarry-outs and can be expressed as results of two predeterminedarithmetic operations performed for a pair of carry-outs generated insaid regular adders, and said carry-outs are commutative in both saidtwo predetermined arithmetic operations.
 11. The tree circuit of claim10, whereinsaid pseudo carry-outs are a logic sum of said pair ofcarry-outs and an inversion of a logic product of said pair ofcarry-outs.
 12. The tree circuit of claim 10, whereinthe extended adderlocated on said specific bit position receives a carry-out from thelower-next bit position and said second element inputted to saidextended addition block and propagates said pseudo carry-outs to theextended adder located on the higher-next bit position.
 13. The treecircuit of claim 12, whereinsaid extended addition block further has aregular adder higher than said extended adders, and the highest one ofsaid extended adders receives said pair of pseudo carry-outs from thelower-next bit position and outputs a pair of carry-outs to said regularadder located on the higher-next bit position.